22 research outputs found

    On Dependency Analysis via Contractions and Weighted FSTs

    Get PDF
    Arc contractions in syntactic dependency graphs can be used to decide which graphs are trees. The paper observes that these contractions can be expressed with weighted finite-state transducers (weighted FST) that operate on string-encoded trees. The observation gives rise to a finite-state parsing algorithm that computes the parse forest and extracts the best parses from it. The algorithm is customizable to functional and bilexical dependency parsing, and it can be extended to non-projective parsing via a multi-planar encoding with prior results on high recall. Our experiments support an analysis of projective parsing according to which the worst-case time complexity of the algorithm is quadratic to the sentence length, and linear to the overlapping arcs and the number of functional categories of the arcs. The results suggest several interesting directions towards efficient and highprecision dependency parsing that takes advantage of the flexibility and the demonstrated ambiguity-packing capacity of such a parser.Peer reviewe

    Lingvistiset kieliopit, joilla on on hyvin matala kompleksisuus

    Get PDF
    We have presented an overview of the FSIG approach and related FSIG gram- mars to issues of very low complexity and parsing strategy. We ended up with serious optimism according to which most FSIG grammars could be decom- posed in a reasonable way and then processed efficiently.Peer reviewe

    Bounded-Depth High-Coverage Search Space for Noncrossing Parses

    Get PDF
    Volume: Proceeding volume: 13A recently proposed encoding for noncrossing digraphs can be used to implement generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing and it gives rise to a high-coverage bounded-depth approximation of the space of non- crossing digraphs. This subset is presented elegantly by a finite-state machine that recognizes an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry.A recently proposed encoding for non- crossing digraphs can be used to imple- ment generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing and it gives rise to a high-coverage bounded-depth approximation of the space of non- crossing digraphs. This subset is presented elegantly by a finite-state machine that recognises an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry.Peer reviewe

    A hierarchy of mildly context sensitive dependency grammar

    Get PDF
    The paper presents Colored Multiplanar Link Grammars (CMLG). These grammars are reducible to extended right-linear S-grammars (Wartena 2001) where the storage type S is a concatenation of c pushdowns. The number of colors available in these grammars induces a hierarchy of Classes of CMLGs. By fixing also another parameter in CMLGs, namely the bound t for non-projectivity depth, we get c-Colored t-Non-projective Dependency Grammars (CNDG) that generate acyclic dependency graphs. Thus, CNDGs form a two-dimensional hier- archy of dependency grammars. A part of this hierarchy is mildly context-sensitive and non-projective.The paper presents Colored Multiplanar Link Grammars (CMLG). These grammars are reducible to extended right-linear S-grammars (Wartena 2001) where the storage type S is a concatenation of c pushdowns. The number of colors available in these grammars induces a hierarchy of Classes of CMLGs. By fixing also another parameter in CMLGs, namely the bound t for non-projectivity depth, we get c-Colored t-Non-projective Dependency Grammars (CNDG) that generate acyclic dependency graphs. Thus, CNDGs form a two-dimensional hier- archy of dependency grammars. A part of this hierarchy is mildly context-sensitive and non-projective.Peer reviewe

    Monitasoisuus - malli puupankeissa olevia dependenssirakenteita varten

    Get PDF
    Cited several times. E.g. 1. Marco Kuhlmann & Joakim Nivre: Mildly non-projective dependency structures. In the Proceedings of the COLING/ACL on Main conference poster sessions, p. 507--514. In series COLING-ACL '06. Sydney, Australia, 2006. 2. Carlos Gómez-Rodriguez and Joakim Nivre: A transition-based for 2-Planar Dependency Structures. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1492--1501, Uppsala, Sweden, 11-16 July 2010. ACL 3. Marco Kuhlmann. Dependency Structures and Lexicalized Grammars. An Algebraic Approach. LNAI 6270. FoLLI Publications on Logic, Language and Information. Springer 2010. 4.Eri kielille tehtyjen puupankkien määrä kasvaa tasaista vauhtia. Huomattava osa viimeaikaisista puupankeista käyttää annotaatiokäytäntöä joka perustuu dependenssisyntaksiin. Esitämme tässä paperissa mallin lingvistisesti adekvaattien dependenssirakenteiden luokille. Malli on testattu Danish Dependency Treebankin avulla. jne...The number of treebanks available for different languages is growing steadily. A considerable portion of the recent treebanks use annotation schemes that are based on dependency syntax. In this paper, we give a model for linguistically adequate classes of dependency structures in treebanks. Our model is tested using the Danish Dependency Treebank. Lecerf’s projectivity hypothesis assumes a constraint on linear word- order in dependency analyses. Unfortunately, projectivity does not lend itself to adequate treatment of certain non-local syntactic phenomena which are extensively studied in the literature of constituent-based theories such as TG, GB, GPSG, TAG, and LFG. Among these phenomena are scrambling, topicalizations, WH-movements, cleft sentences, discontinuous NPs, and discontinuous negation. a few relaxed models somewhat similar to projectivity have been pro- posed. These include quasi-projectivity, planarity, pseudo-projectivity, meta-projectivity, and polarized dependency grammars. None of the these models is motivated by formal language theory. The current work presents a new word-order model with a clear connection to formal language theory. The model, multiplanarity with a bounded number of planes, is based on planarity, which is itself a generalization of projectivity.Peer reviewe

    Nimiöityjen sulutusten avulla saadut säännölliset approsimaatiot

    Get PDF
    The editors do not seem to get the revised papers compiled into post-proceedings. There is a similar pre-proceedings that contains the original version, but the revised version of the article is much better.This paper presents an approximation method that is based on a new representation theorem for context-free languages. According to it, any context free language can be represented as a homomorphic image of an intersection of a set of constraint languages defining properties of valid labeled bracketings. The intersected languages of the new theorem differ from the ones used in the famous theorem by Chomsky and Schützenberger (1963). If these constraint languages are restricted to make them regular, we obtain a new kind of compact representation for regular approximations. The resulting approximation can be chosen to be either a subset or a superset of the original context-free languagePaperi esittää (kielioppien) approksimointimenetelmän, joka perustuu uuteen kontekstittomien kielten esitysmuototeoreemaan. Sen mukaan, jokainen kontekstiton kieli voidaan esittää oikeiden sulutusten ominaisuuksia määrittelevien rajoitekielten homomorfisena kuvata. Uuden esitysmuototeoreeman sisältämät leikattavat kielet eroavat kuuluisasta Chomsky-Schützenberger (1963) esitysmuodosta. Jos nämä toistensa kanssa leikattavat rajoitteet rajoitetaan säännöllisiin kieliin niin, että niistä syntyy uudenlainen tiivis esitysmuoto säännöllisille lausekkeille. Syntyvä approximaatio voidaan valita esittämään kieliopin generoiman kielen ali- tai ylijoukkoa.Peer reviewe

    Äärellistilaisesta fonologiasta ja autosegmentaalisista esitysmuodoista

    Get PDF
    Proceeding volume: 11Building finite-state transducers from written autosegmental grammars of tonal languages involves compiling the rules into a notation provided by the finite-state tools. This work tests a simple, human readable approach to compile and debug autosegmental rules using a simple string encoding for autosegmental representations. The proposal is based on brackets that mark the edges of the tone autosegments. The bracket encoding of the autosegments is compact and directly human readable. The paper also presents a usual finite-state transducer for transforming a concatenated string of lexemes where each lexeme (such as ”babaa|HH”) consists of a segmental substring and a tonal substring into a chronological master string (”b[a]b[aa]”) where the tone autosegments are associated with their segmental spans.Peer reviewe

    Tehokas sisäänpäindeterministisiin automaatteihin perustuva Constraint Grammar -jäsennin

    Get PDF
    Proceeding volume: 14 (2011)Pappret conceptualizes parsning med Constraint Grammar på ett nytt sätt som en process med två viktiga representationer. En representation innehåller lokala tvetydighet och den andra sammanfattar egenskaperna hos den lokala tvetydighet klasser. Båda representationer manipuleras med ren finite-state metoder, men deras samtrafik är en ad hoc -tillämpning av rationella potensserier. Den nya tolkningen av parsning systemet har flera praktiska fördelar, bland annat det inåt deterministiska sättet att beräkna, representera och räkna om alla potentiella tillämpningar av reglerna i meningen.Paperi uudelleenkonseptualisoi Constraint Grammarin sellaisena viitekehyksenä, jossa säännöt tarkentavat paikallisen ambiguiteetin tiivistä esitysmuotoa samalla kun sääntöjen ehdot sovitetaan piirrevektoreita vasten, jotka esittävät tiivistetyjen esitysmuotojen summia. Molemmat näkökulmat monitulkintaisuuteen käsitellään käyttäen puhtaita (pure) äärellistilaisia operaatioita. Tiivis esitysmuoto kuvataan piirrevektoreihin rationaalisten potenssisarjojen avulla. Tämä yhteys ei ole yhtään vähemmän puhdas kuin aikaisemmin vallalla ollut tulkinta, jonka edellyttää että leksikaalisen transduktorin tuottama sanan luentajoukko maagisesti linearisoidaan merkatuksi luentojen peräkkäinasetteluksi, joka syötetään puhtaille (äärellistilaisille) transduktoreille. Esitetyllä lähestymistavalla on useita käytännöllisiä etuja: mm. sisäänpäin deterministinen tapa laskea, esittää ja ylläpitää kaikki mahdolliset kohdat, joissa säännöt voivat soveltua virkkeeseen.The paper reconceptualizes Constraint Grammar as a framework where the rules refine the compact representations of local ambiguity while the rule conditions are matched against a string of feature vectors that summarize the compact representations. Both views to the ambiguity are processed with pure finite-state operations. The compact representations are mapped to feature vectors with the aid of a rational power series. This magical interconnection is not less pure than a prevalent interpretation that requires that the reading set provided by a lexical transducer is magically linearized to a marked concatenation of readings given to pure transducers. The current approach has several practical benefits, including the inward deterministic way to compute, represent and maintain all the applications of the rules in the sentence.Peer reviewe

    Syntaksin kuvaaminen käyttäen tähdettömiä säännöllisiä lausekkeita

    Get PDF
    Has been cited by: 1. Nathan Vaillette. Dissertation. 2004 2. András Kornai. Mathematical Linguistics. Springer Verlag. 2008. 3. Mans Hulden, Regular Expressions and Predicate Logic in Finite-State Language Processing, Proceeding of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008, p.82-97, July 11, 2009 Proceeding volume: 10Koskenniemen Äärellistilaisen leikkauskieliopin (FSIG) lauseopilliset rajoitteet ovat loogisesti vähemmän kompleksisia kuin mihin niissä käytetty formalismi vittaisi. Osoittautuukin että vaikka Voutilaisen (1994) englannin kielelle laatima FSIG-kuvaus käyttää useita säännöllisten lausekkeiden laajennuksia, kieliopin kuvaus kokonaisuutenaan palautuu äärelliseen yhdistelmään unionia, komplementtia ja peräkkäinasettelua. Tämä on oleellinen parannus ENGFSIG:n descriptiiviseen kompleksisuuteen. Tulos avaa ovia FSIG-kuvauksen loogisten ominaisuuksien syvemmälle analyysille ja FSIG kuvausten mahdolliselle optimoinnillle. Todistus sisältää uuden kaavan, joka kääntää Koskenniemien rajoiteoperaation ilman markkerimerkkejä.Syntactic constraints in Koskenniemi’s Finite-State Intersection Grammar (FSIG) are logically less complex than their formalism (Koskenniemi et al., 1992) would suggest: It turns out that although the constraints in Voutilainen’s (1994) FSIG description of English make use of several extensions to regular expressions, the description as a whole reduces to a finite combination of union, complement and concatenation. This is an essential improvement to the descriptive complexity of ENGFSIG. The result opens a door for further analysis of logical properties and possible optimizations in the FSIG descriptions. The proof contains a new formula for compiling Koskenniemi’s restriction operation without any marker symbols.Peer reviewe

    Forgotten Islands of Regularity in Phonology

    Get PDF
    Open access publication of this volume supported by National Research, Development and Innovation Office grant NKFIH #120145 `Deep Learning of Morphological Structure'.Giving birth to Finite State Phonology is classically attributed to Johnson (1972), and Kaplan and Kay (1994). However, there is an ear- lier discovery that was very close to this achievement. In 1965, Hennie presented a very general sufficient condition for regularity of Turing machines. Although this discovery happened chronologically before Generative Phonology (Chomsky and Halle, 1968), it is a mystery why its relevance has not been realized until recently (Yli-Jyrä, 2017). The antique work of Hennie provides enough generality to advance even today’s frontier of finite-state phonology. First, it lets us construct a finite-state transducer from any grammar implemented by a tightly bounded one- tape Turing machine. If the machine runs in o(n log n), the construction is possible, and this case is reasonably decidable. Second, it can be used to model the regularity in context-sensitive derivations. For example, the suffixation in hunspell dictionaries (Németh et al., 2004) corresponds to time-bounded two-way computations performed by a Hennie machine. Thirdly, it challenges us to look for new forgotten islands of regularity where Hennie’s condition does not necessarily hold.Hennie presented a very general sufficient condition for regularity of Turing machines. This happened chronologically before Generative Phonology (Chomsky & Halle 1968) and the related finite-state research (Johnson 1972; Kaplan & Kay 1994). Hennie’s condition lets us (1) construct a finite-state transducer from any grammar implemented by a linear-time Turing machine, and (2) to model the regularity in context-sensitive derivations. For example, the suffixation in hunspell dictionaries (Németh et al. 2004) corresponds to time-bounded two way computations performed by a Hennie machine. Furthermore, it challenges us to look for new forgotten islands of regularity where Hennie’s condition does not necessarily hold.Peer reviewe
    corecore